First cut of the Transformation Playground Frontend and Backend #1190

chelma · 2024-12-10T19:30:26Z

Description

Added an initial cut of the Transformation Playground backend
The user is able to start up the Django webserver and POST against the transforms/index/ path to create some Python code that will transform an input ES 6.8 Index Settings w/ multi-type mappings or ES 7.10 Index Settings into equivalent OpenSearch 2.X settings. The API returns the transform code, the result of invoking the transform against the user's input, and the validation results.
The backend takes the user's input JSON, runs it through Claude 3.5 Sonnet to generate some Python transformation code, loads the transform code, invokes the transform against the input, and creates/deletes each transformed Index setting against the OpenSearch cluster the user provided for testing.
The user is also able to spin up a quick React/Cloudscape frontend and hit the backend in their web browser. The frontend takes in the user's input JSON and request a GenAI recommendation. They can clear their current transform as well. There's dropdowns for different configuration options, though currently the only one with different options is the Source, which supports both ES 6.8 and ES 7.10. The user is able to set static guidance for the GenAI recommendation to shape its output, as well as directly modify and test their own transformation code without involving GenAI.

Issues Resolved

Testing

Added unit tests
Ran manual tests. Example:

(venv) chelma@80a9970a4d02 tp_backend % python3 manage.py runserver
Performing system checks...

System check identified no issues (0 silenced).
December 10, 2024 - 19:14:58
Django version 5.1.4, using settings 'tp_backend.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

(venv) chelma@80a9970a4d02 LP04 % curl -X POST "http://127.0.0.1:8000/transforms/index/" -H "Content-Type: application/json" -d '
{
    "transform_language": "Python",
    "source_version": "Elasticsearch 6.8",
    "target_version": "OpenSearch 2.17",
    "input_shape": {
        "index_name": "test-index",
        "index_json": {
            "settings": {
                "index": {
                    "number_of_shards": 1,
                    "number_of_replicas": 0
                }
            },
            "mappings": {
                "type1": {
                    "properties": {
                        "title": { "type": "text" }
                    }
                },
                "type2": {
                    "properties": {
                        "contents": { "type": "text" }
                    }
                }
            }
        }
    },
    "test_target_url": "http://localhost:29200"
}'

{
    "output_shape": [
        {
            "index_name": "test-index-type1",
            "index_json": {
                "settings": {
                    "index": {
                        "number_of_shards": 1,
                        "number_of_replicas": 0
                    }
                },
                "mappings": {
                    "properties": {
                        "title": {
                            "type": "text"
                        }
                    }
                }
            }
        },
        {
            "index_name": "test-index-type2",
            "index_json": {
                "settings": {
                    "index": {
                        "number_of_shards": 1,
                        "number_of_replicas": 0
                    }
                },
                "mappings": {
                    "properties": {
                        "contents": {
                            "type": "text"
                        }
                    }
                }
            }
        }
    ],
    "transform_logic": "from typing import Dict, Any, List\nimport copy\n\n\"\"\"\nThis transformation function converts Elasticsearch 6.8 index settings to OpenSearch 2.17 compatible format.\nIt handles the removal of type-based mappings by creating separate indexes for each type.\n\"\"\"\n\ndef transform(source_json: Dict[str, Any]) -> List[Dict[str, Any]]:\n    result = []\n    index_name = source_json['index_name']\n    index_json = source_json['index_json']\n    \n    # Extract settings\n    settings = index_json.get('settings', {})\n    \n    # Extract mappings\n    mappings = index_json.get('mappings', {})\n    \n    # Create a separate index for each type\n    for type_name, type_mapping in mappings.items():\n        new_index_name = f\"{index_name}-{type_name}\"\n        new_index_json = {\n            'settings': copy.deepcopy(settings),\n            'mappings': type_mapping\n        }\n        \n        result.append({\n            'index_name': new_index_name,\n            'index_json': new_index_json\n        })\n    \n    return result",
    "validation_report": [
        "Attempting to load the transform function...",
        "Loaded the transform function without exceptions",
        "Attempting to invoke the transform function against the input...",
        "Invoked the transform function without exceptions",
        "The transformed output has 2 Index entries.",
        "Using target cluster for testing: http://localhost:29200",
        "Attempting to create & delete index 'test-index-type1' with transformed settings...",
        "Created index 'test-index-type1'.  Response: \n{\"acknowledged\": true, \"shards_acknowledged\": true, \"index\": \"test-index-type1\"}",
        "Deleted index 'test-index-type1'.  Response: \n{\"acknowledged\": true}",
        "Attempting to create & delete index 'test-index-type2' with transformed settings...",
        "Created index 'test-index-type2'.  Response: \n{\"acknowledged\": true, \"shards_acknowledged\": true, \"index\": \"test-index-type2\"}",
        "Deleted index 'test-index-type2'.  Response: \n{\"acknowledged\": true}"
    ],
    "validation_outcome": "PASSED"
}

The default view when you open the GUI, with an input ES 6.8 JSON filled in

The GenAI recommended transformation for that input JSON

The modal window for modifying the GenAI recommendations

The results of applying the user guidance in a new, GenAI recommend transform

Check List

New functionality includes testing
- All tests pass, including unit test, integration test and doctest
New functionality has been documented
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chris Helma <[email protected]>

TransformationPlayground/tp_backend/transform_api/views.py

TransformationPlayground/README.md

TransformationPlayground/tp_backend/transform_api/models.py

TransformationPlayground/tp_backend/transform_api/views.py

TransformationPlayground/tp_backend/transform_expert/prompting/__init__.py

TransformationPlayground/tp_backend/transform_expert/tests/utils/test_transforms.py

TransformationPlayground/tp_backend/transform_expert/tools.py

Signed-off-by: Chris Helma <[email protected]>

TransformationPlayground/playground/transform_api/views.py

Signed-off-by: Chris Helma <[email protected]>

chelma · 2024-12-12T15:17:24Z

TransformationPlayground/playground/transform_expert/prompting/templates.py

+- Do not attempt to be friendly in your responses.  Be as direct and succint as possible.
+- Think through the problem, extract all data from the task and the previous conversations before creating a plan.
+- Never assume any parameter values while invoking a tool or function.
+- You may ask clarifying questions to the user if you need more information.


Oof, definitely don't this line in here. Remove.

peternied

Thanks for getting this in front of the team @chelma

I did not get a chance to refine the associated jira's in advance of this change arriving - I expect there is going to be back and forth on a number of comments I've raised.

peternied · 2024-12-12T19:27:41Z

.gitignore

While there is a readme about this project, it doesn't have an architecture document, lets build one out similar to how we have with RFS but (with scope adjusted).

Good callout. I have a bunch of existing docs that can be translated into this format.

peternied · 2024-12-12T19:29:33Z

TransformationPlayground/playground/playground/settings.py

+        'ENGINE': 'django.db.backends.sqlite3',
+        'NAME': BASE_DIR / 'db.sqlite3',


What goes in this db?

Good question. In the future, I expect we'll store:

The user-loaded inputs from the left panel (what if you have a snapshot or source cluster w/ 10k+ indices? Probably should be stored client-side), which we'll then display portions of in the client.

The transformation logic for each "input shape" that the user has made manual changes to, along with the full validation history and the LLM conversation. This will enable us to deploy all the transformations in a bundle that replay/backfill can use, it will facilitate our (the team's) debugging efforts, facilitate a trackable history of what the LLM's actions were and how they were incorporated into the Cx migration (think for security owners), and it will facilitate LLM Model selection/evaluation as well as LLM Fine-Tuning (we could have a collection of real migrations to tune with).

It may also store the validation test logic for each input shape. Imagine if, as part of an assessment process, we have an LLM propose additional tests we should be running for validation and then dynamically create and incorporate them. This would be really useful for validating behavior (does the target behave the way we want), not just API spec conformance (does it return a 2XX).

peternied · 2024-12-12T19:32:53Z

TransformationPlayground/playground/playground/settings.py

+        'transform_api_debug_file': {
+            'level': 'DEBUG',
+            'class': 'logging.FileHandler',
+            'filename': 'logs/transform_api.debug.log',
+            'formatter': 'verbose',
+        },
+        'transform_api_info_file': {
+            'level': 'INFO',
+            'class': 'logging.FileHandler',
+            'filename': 'logs/transform_api.info.log',
+            'formatter': 'verbose',
+        },


Double checking, does the debug log include the info level logs too? Is this in alignment with the logging working group session we ran?

Yeah, it includes both sets of logs. I don't remember the working group session, is there a link to its action items/artifacts?

peternied · 2024-12-12T19:34:35Z

TransformationPlayground/playground/transform_api/tests/test_serializers.py

Thanks for adding tests, lets make sure to update the CI.yml to run these tests so we have code coverage information as well.

Will investigate!

peternied · 2024-12-12T19:38:52Z

TransformationPlayground/playground/transform_expert/expert.py

+    llm = ChatBedrockConverse(
+        model="anthropic.claude-3-5-sonnet-20240620-v1:0", # This is the older version of the model, could be updated
+        temperature=0, # Suitable for straightforward, practical code generation
+        max_tokens=4096,


How many tokens will we need? Can you frame how we choose this target?

In the future, I expect this to be fully configurable by the user - what if they want to use their own LLM, for example, rather than bedrock? We'll need to figure out what the right level of configurability is for initial user feedback. On the token front - that is the maximum number of output tokens, not input. 4k tokens is a LOT of code, I would be highly surprised if we ever had a transform that needed anywhere close to that. For reference, the ~450 lines of text in the OS 2.7 "knowledge" file is only ~3500 tokens.

peternied · 2024-12-12T20:04:34Z

TransformationPlayground/playground_frontend/src/app/page.tsx

+      </Container>
+
+      {/* Testing/Output Column */}
+      <Container header="Testing & Output Panel">


How do you cycle back and forth between source / transformation logic / output?

I haven't implemented this all yet (obviously), but here's the user journey I'm envisioning:

The user loads in their input shapes. Currently, that's just copy/pasting JSON into a text field, but soon I'd like to be able to load the contents of a snapshot into the Playground via a configurable process. Imagine a dialogue where you select the S3 location of your snapshot, something extracts the templates, indices, and a configurable number of documents (that's a whole workstream right there), and presents them in the left panel so you can see everything in your snapshot, then click on specific items to see their JSON, etc. Even once we've added the ability to read from snapshots, we'll want to leave open the possibility of adding manual input shapes. Example - the user doesn't have access to a snapshot of their production cluster but they do have access to the settings JSON and a few documents, which they can manually enter as shapes into the Playground for testing.

The user then uses the dropdowns to select what type of transformation they are performing. The "transform type" (Index vs. template vs. documents) dropdown will disappear, because it will be obvious from context (the user selected an index on the input panel, so...).

Once that is selected, we will have a pre-canned transform (we'll hand-craft and save, no GenAI) that is supplied based on those selections that will serve as a default. The default will be populated into the UI and pinned against that shape in the backend.

The user can then hit a "test" button to see how that default transform will work against their input shape (e.g. we kick off the validation process).

If the user wants to include functional tests into validation (e.g. actually creating/deleting indices against a real test cluster), they can go through a dialogue to set up and test a connection to their target cluster. Currently, that process is just "paste a URL into a text field and assume there's no auth to worry about").

If the user likes the results of the default, pre-canned transform and it passes validation - great! No further work needed. If not, they can either modify the transform directly in the UI and run the validation process again, or they can ask the GenAI assistant to modify the existing transform in some way (not currently implemented, but expect to be easy). Either way, the new transform is then pinned against the input shape in the backend so that we know there's something custom going on.

The user then selects a different input shape and goes through this process until everything is transforming as they desire, then they can either hit a button to "deploy" their transformed data/metadata to the target cluster (think for metadata migrations which are inherently "low scale") or "bundle" the transformations for inclusion into the Migrations Assistant backfill/replay processes (which are anything but "low scale"). For the "deploy" button, this is basically just skipping the cleanup step of the validation process we've already performed (e.g. take the transformed output and PUT it against the target, but don't delete it afterwards). For bundling, this is taking the stuff we have in the server-side DB, packaging it into a tarball/zip/whatever, and sticking it somewhere (S3?) so that the backfill and replayer processes can pick it up and load the transform objects.

peternied · 2024-12-12T20:06:52Z

TransformationPlayground/playground_frontend/src/app/page.tsx

This seems like 3 different experience, sourcing input json, transformation editing, and viewing output json. Seems like these should be decoupled into different pages, but I'm not sure of the overall workflow.

I think there's a lot of benefit in visualizing the three things together; I outlined the user-journey I'm imagining above here. Curious what you think after reading that.

[1] #1190 (comment)

peternied · 2024-12-12T20:08:42Z

TransformationPlayground/playground_frontend/src/app/page.tsx

+      </Container>
+
+      {/* Transformation Column */}
+      <Container header="Transformation Panel">


This seems like it should be refactored into its own control to reduce overall coupling

Would love more details! But yeah - this whole page is ripe for refactoring.

peternied · 2024-12-12T20:10:21Z

TransformationPlayground/playground_frontend/src/app/page.tsx

+  const sourceVersionOptions: SelectProps.Options = [{ label: "Elasticsearch 6.8", value: "Elasticsearch 6.8" }];
+  const targetVersionOptions: SelectProps.Options = [{ label: "OpenSearch 2.17", value: "OpenSearch 2.17" }];
+  const transformTypeOptions: SelectProps.Options = [{ label: "Index", value: "Index" }];
+  const transformLanguageOptions: SelectProps.Options = [{ label: "Python", value: "Python" }];


Seems like this data should come from the API, or there should be a mapping of API data against the strings for visualization

+1. I think this comes down to a shared spec between the frontend and backend; this also applies to the shape of the requests/responses that are going over the wire. I know we've briefly discussed OpenAPI or something as a model format.

peternied · 2024-12-12T20:14:26Z

TransformationPlayground/playground/manage.py

Lets use the existing jdango api server in console_api

I think there's a lot of benefit in separating this as a distinct project:

The Playground is conceptually only loosely coupled to the rest of the Migration Assistant. There's no reason the snapshot it's reading inputs from needs to come from a console snapshot create command. Similarly, there's no reason the target cluster used for validation (or we do low-scale deployment against) needs to have been created as a part of the Migration Assistant setup process. The best argument might be around the format of the transformation bundle being coupled to the Migration Assistant, but there's no reason we couldn't specify that format to be generic. That said - you don't need to care about the rest of Migration Assistant at all for the Playground to be a useful way of creating/testing these transformations.

The Playground is a very different experience than the general operator workflow. It's inherently interactive and cyclical in a way that running basic configuration commands is not.

The Playground is generically useful and extensible as a standalone product feature. At it's core, it's a way to visualize data/metadata, have a GenAI assisted process for modifying/transforming it, testing the modifications/transformations, and some mechanisms for deployment. Swapping out what specifically it's using as an input or testing against or has a desired output is inherently modular. There's already multiple other use-cases we've discussed for where this could apply in other domains outside of Elasticsearch to OpenSearch migrations.

The key thing in my mind is that users of the Migration Assistant should have a cohesive overall experience; but I think tightly coupling the Playground to the Migration Assistant Operator API/GUI in a way that prevents re-use would be a mistake.

Signed-off-by: Chris Helma <[email protected]>

TransformationPlayground/playground/transform_api/views.py

+            logger.debug(f"Validation report entries:\n{validation_report.report_entries}")
+        except TestTargetInnaccessibleError as e:
+            logger.error(f"Target cluster is not accessible: {str(e)}")
+            return Response({'error': str(e)}, status=status.HTTP_400_BAD_REQUEST)


TransformationPlayground/playground/transform_api/views.py

+        except Exception as e:
+            logger.error(f"Testing process failed: {str(e)}")
+            logger.exception(e)
+            return Response({'error': str(e)}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)


Signed-off-by: Chris Helma <[email protected]>

chelma added 13 commits December 4, 2024 05:22

Set up Transformation Playground Django app

578a99b

Signed-off-by: Chris Helma <[email protected]>

Copied over some transform code, got logging working

65c1e2a

Signed-off-by: Chris Helma <[email protected]>

Django API passes input shape to LLM, returns code

13e998b

Signed-off-by: Chris Helma <[email protected]>

Moved TP transform/ -> transforms/, IDs are now UUIDs

38a9fc2

Signed-off-by: Chris Helma <[email protected]>

Substantial refactoring of TP internals to improve abstraction

a427c1c

Signed-off-by: Chris Helma <[email protected]>

Checkpoint: TP returns transform & transform output on POST

3b35bfe

Signed-off-by: Chris Helma <[email protected]>

Checkpoint: TP tests transformed output against target cluster

bea8186

Signed-off-by: Chris Helma <[email protected]>

TP: refactored prompt generation code

83e6852

Signed-off-by: Chris Helma <[email protected]>

TP Checkpoint: serialization refactoring, added some unit tests

67f1b00

Signed-off-by: Chris Helma <[email protected]>

Refactored TP validation, added unit tests

babea3b

Signed-off-by: Chris Helma <[email protected]>

TP: added more unit tests

fc36eb2

Signed-off-by: Chris Helma <[email protected]>

TP: Minor refactoring, added more unit tests

d76afc1

Signed-off-by: Chris Helma <[email protected]>

Added a TP README

ee753d1

Signed-off-by: Chris Helma <[email protected]>

chelma requested review from AndreKurait, gregschohn, lewijacn, mikaylathompson, peternied and sumobrian as code owners December 10, 2024 19:30

github-advanced-security bot found potential problems Dec 10, 2024

View reviewed changes

TransformationPlayground/tp_backend/transform_api/views.py Fixed Show resolved Hide resolved

TransformationPlayground/tp_backend/transform_api/views.py Fixed Show resolved Hide resolved

chelma commented Dec 10, 2024

View reviewed changes

chelma added 2 commits December 10, 2024 15:27

Minor tweaks per PR

5472e65

Signed-off-by: Chris Helma <[email protected]>

TP: Renamed Django project tp_backend -> playground

f2b0a13

Signed-off-by: Chris Helma <[email protected]>

github-advanced-security bot found potential problems Dec 12, 2024

View reviewed changes

TransformationPlayground/playground/transform_api/views.py Dismissed Show dismissed Hide dismissed

TransformationPlayground/playground/transform_api/views.py Dismissed Show dismissed Hide dismissed

chelma added 2 commits December 12, 2024 03:44

TP: Added validation results to response object

a06b18a

Signed-off-by: Chris Helma <[email protected]>

Added a fast frontend for the TP

46a5ebd

Signed-off-by: Chris Helma <[email protected]>

chelma changed the title ~~First cut of the Transformation Playground backend~~ First cut of the Transformation Playground Frontend and Backend Dec 12, 2024

chelma commented Dec 12, 2024

View reviewed changes

peternied requested changes Dec 12, 2024

View reviewed changes

TP: Added a "test" button to GUI, shared OpenAPI Spec

e4a82ca

Signed-off-by: Chris Helma <[email protected]>

github-advanced-security bot found potential problems Dec 16, 2024

View reviewed changes

chelma added 2 commits December 27, 2024 07:30

TP: Enable user guidance on GenAI recs, add ES 6.8 and 7.10 knowledge

ecd95ba

Signed-off-by: Chris Helma <[email protected]>

TP: Slightly improved error handling behavior

2015e8b

Signed-off-by: Chris Helma <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First cut of the Transformation Playground Frontend and Backend #1190

First cut of the Transformation Playground Frontend and Backend #1190

chelma commented Dec 10, 2024 •

edited

Loading

chelma Dec 12, 2024

peternied left a comment

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024 •

edited

Loading

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024 •

edited

Loading

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024

peternied Dec 12, 2024

chelma Dec 13, 2024

		'ENGINE': 'django.db.backends.sqlite3',
		'NAME': BASE_DIR / 'db.sqlite3',

First cut of the Transformation Playground Frontend and Backend #1190

Are you sure you want to change the base?

First cut of the Transformation Playground Frontend and Backend #1190

Conversation

chelma commented Dec 10, 2024 • edited Loading

Description

Issues Resolved

Testing

Check List

Choose a reason for hiding this comment

peternied left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chelma Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chelma Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chelma commented Dec 10, 2024 •

edited

Loading

chelma Dec 13, 2024 •

edited

Loading

chelma Dec 13, 2024 •

edited

Loading